Syntax Deep Explorer

نویسندگان

  • José Correia
  • Jorge Baptista
  • Nuno J. Mamede
چکیده

The analysis of the co-occurrence patterns between words allows to understand the use (and meaning) of words that are associated with different relationships. The quantification of these standards is a powerful tool in modern lexicography as well as in the construction of basic linguistic resources for the processing of natural language, or learning a language. The aim of this project is to develop a tool that, based on the STRING natural language processing chain, allows one to explore co-occurrence data obtained from Portuguese texts. Nowadays, there are some tools like DeepDict, SketchEngine and Wortschatz that allow to get the information on the co-occurrence patterns of a word in Portuguese corpora. These tools are based on different natural language processing systems and adopt different measures of association. The association measures used are the Mutual Information, the Dice coefficient, the Log-likelihood ratio, or different variants of these base measures. The presented solution consists in the extraction of co-occurrences and a web interface. The extraction occurs from a processed corpus by STRING, that finds and stores the co-occurrences in a database. Then, for each cooccurrence stored are calculated the different association measures. The web application provides to users an interface that allows to exploit these co-occurrence patterns. The solution is evaluated based on consumed time to extract the co-occurrences from CETEMPúblico corpus, the space and organization of the database and the response time of web interface. The developed project allows the quick access to collected co-occurrences in corpora produced by STRING, taking advantage of the rich lexical resources in the chain, as well as its sophisticated syntactic and semantic analysis in order to produce results that the above systems don’t allow.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Explorer Edinburgh ' s Syntax - Based Systems at WMT 2015

This paper describes the syntax-based systems built at the University of Edinburgh for the WMT 2015 shared translation task. We developed systems for all language pairs except French-English. This year we focused on: translation out of English using tree-to-string models; continuing to improve our English-German system; and source-side morphological segmentation of Finnish using Morfessor.

متن کامل

Reverse Engineering of Network Software Binary Codes for Identification of Syntax and Semantics of Protocol Messages

Reverse engineering of network applications especially from the security point of view is of high importance and interest. Many network applications use proprietary protocols which specifications are not publicly available. Reverse engineering of such applications could provide us with vital information to understand their embedded unknown protocols. This could facilitate many tasks including d...

متن کامل

Accurate Deep Syntactic Parsing of Graphs: The Case of French

Parsing predicate-argument structures in a deep syntax framework requires graphs to be predicted. Argument structures represent a higher level of abstraction than the syntactic ones and are thus more difficult to predict even for highly accurate parsing models on surfacic syntax. In this paper we investigate deep syntax parsing, using a French data set (Ribeyre et al., 2014a). We demonstrate th...

متن کامل

Automatisierung mit ooRexx und BSF4ooRexx

Diese Arbeit führt in die quelloffene und freie Skriptsprache ooRexx und das gleichermaßen quelloffene und freie Funktionspaket BSF4ooRexx zur Anbindung an Java-Klassenbibliotheken ein. Ein Anwendungsschwerpunkt liegt hierbei in der Automatisierung von wiederkehrenden, geschäftlichen Abläufen in betrieblichen Fachabteilungen durch " Endbenutzerprogrammierer " (" Business Programmers "). Zur Ill...

متن کامل

Format for Literal IPv6 Addresses in URL's

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as refer...

متن کامل

Multi-level modelling in the Modelverse

In this paper, we introduce the Modelverse, a metamodelling framework and model repository. It clearly distinguishes and supports physical and linguistic conformance relations and allows for deep characterization and deep instantiation using potency. We introduce language fragments, which are reusable pieces of a language definition, consisting of an abstract syntax definition, as well as the d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016